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This paper presents a statistical analysis of the structure of Peer-to-Peer (P2P) social networks 
that captures social associations of distributed peers in resource sharing. Peer social networks 
appear to be mainly composed of pure resource providers that guarantee high resource availability 
and reliability of P2P systems. The major peers that both provide and request resources are only 
a small fraction. The connectivity between peers, including undirected, directed (out and in) and 
weighted connections, is scale-free and the social networks of all peers and major peers are small 
world networks. The analysis also confirms that peer social networks show in general disassortative 
correlations, except that active providers are connected between each other and by active requesters. 
The study presented in this paper gives a better understanding of peer relationships in resource 
sharing, which may help a better design of future P2P networks and open the path to the study of 
transport processes on top of real P2P topologies. 

PACS numbers: 89.75.Fb,89.20.Hh,89.20.-a 



I. INTRODUCTION 

In the last several years, many systems have been an- 
alyzed unraveling the way in which their constituents in- 
teract which each other. Surprisingly, many seemingly 
diverse phenomena found in biological, social and tech- 
nological systems IM |^ Q share a complex interaction 
topology that is in most cases characterized by the ex- 
istence of a few key nodes that participates in a large 
number of interactions Jj, i2j 0> • This observation is in 
sharp contrast to previous studies that in order to model 
the dynamical aspects of biological, social and techno- 
logical processes assumed a regular or a random distri- 
bution of interactions for the system's units. Obviously, 
the new approach to the topology of networked systems 
has important bearings on their dynamics and function- 
ingas have been pointed out during the last few years 
[llSSIll- A first step is then the characterization of the 
topological properties in order to get better insights into 
the dynamics, functioning and new designs of natural and 
man-made networked systems. 

Peer-to-Peer (P2P) networks form a kind of open, de- 
centralized overlay network on top of the Internet 0, 
on which distributed users communicate directly to find 
and share resources, often music and movie files. These 
networks may be one of the few largest distributed com- 
puting systems ever, and more surprisingly, they can run 
with great stability and resilient performance in face of 
possibly the most ferocious dynamics. The number of 
hosts running on Gnutella was reported to be 1,800,000 in 
August 2005 ;5| . Recent studies have extensively investi- 
gated the traffic, shared files, queries and peer properties 
of some widely applied P2P systems such as Gnutella and 
Kazaa |E S S • It has also been reported that node 



connectivity (the number of partners a node interacts 
with) in Gnutella follows a combination of a power-law 
distribution (usually for nodes with more than 10 connec- 
tions) and a quasi-constant distribution (for nodes with 
fewer connections) • This may be due to the arbitrarily 
created connections: peers establish connections to oth- 
ers by searching presently available peers on the overlay, 
in addition to a few links to well known hosts provided 
by the system. Peer connections in these systems only 
suggest routes of traffic and usually have no relation to 
peer properties, e.g., peer interests or resources held by 
peers. 

Recent literature proposed P2P social networks, to 
capture social associations of peers in resource sharing 
[13 ■ Similar to human social networks, a P2P social 
network is a collection of connected computing nodes 
(peers), each of which is acquainted with some subset 
of the others. The social connections of peers indicate 
that a peer is a resource provider or can provide infor- 
mation of resource providers to another peer. Connec- 
tion strengths imply the acquaintenanceship or utility 
of a peer to another, i.e., how useful one peer is to an- 
other in resource sharing. Although P2P systems be- 
come more and more significant in distributed applica- 
tions, there is little knowledge about how peers are so- 
cially connected to function together. Primitive investi- 
gation in [inl | confirmed that when peers were organized 
according to their social relationships, (instead of arbi- 
trarily connected links such as those created in Gnutella), 
the formed P2P networks had obviously improved search 
speed and success rate. Moreover, the structure of P2P 
social networks is shown to be directed, asymmetric and 
weighted. 

This paper will provide a more comprehensive analysis 



TABLE I; Topological properties of three (out of six studied) original and major peer social networks. 
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of peer social networks. In particular, we report on prop- 
erties such as degree distribution, clustering coefficient, 
average path length, betweenness and degree-degree cor- 
relations. This analysis, on one hand, will give a better 
understanding of peer associations in resource sharing 
and provide hints for future P2P network design. On 
the other hand, simulations of transport and other pro- 
cesses relevant to this kind of network will be enabled 
from the detailed analysis of the structure of the net- 
works addressed here. 



II. PEER-TO-PEER SOCIAL NETWORKS 

Several P2P social networks were constructed based on 
real user information collected from the Gnutella system. 

An experimental machine running revised Gnucleus, a 
kind of Gnutella client, joined the Gnutella network as a 
super-node, so that it could be connected by more nor- 
mal peers and many other super-nodes each of which was 
also connected by hundreds of normal peers. In order not 
to disturb the actual social links between peers, the ex- 
perimental node did not provide any shared contents nor 
sent queries for resources. It acted as a pure monitor 
to record the traffic passing through it. In particular, 
it recorded information such as which peer answered a 
query of which other peer, indicating that the former 
may be a useful contact to the latter. The experimen- 
tal Gnucleus node ran on the Gnutella network from 5 
hours to 3 days. It usually connected 300 normal peers 
and 30 other super nodes. The traffic data it recorded 
involved 1,000 to 200,000 peers. These data, obviously, 
only reflected associations of a small group of peers in 
the Gnutella system within a limited period of time. The 



Gnutella system should be continuously sampled at mul- 
tiple points in order to obtain a more accurate and global 
picture of peer associations. 

The possible social links between peers were discov- 
ered from the collected raw data to form corresponding 
P2P social networks. A directed connection was created 
from peer A to peer B if B was a query answerer of A. 
The strength or weight of this connection indicated how 
many queries B answered A. The stronger a connection 
strength is, the more important the end peer is to the 
other peer of the connection. A connection strength with 
value 1 suggests a single communication, and hence a 
weak association. Strength with a constantly high value 
suggests the end peer is a frequent resource provider of 
the start peer, and hence a long-term and possibly per- 
manent social relation. The connection strength, how- 
ever, may decay over time in the absence of any contribu- 
tion from the end peer. This issue was further discussed 
in P|. 

As P2P social networks are directed and the connec- 
tion strengths indicate peer affinity, this paper will study 
P2P social networks in respect of their undirected, di- 
rected (including out and in) and weighted connections. 
Of particular interest are the results obtained when the 
edges are considered weighted. As most networks in real 
systems are weighted, it is expected that their full de- 
scription will provide a better and more accurate scenario 
for their study and modeling. However, the investigation 
on weighted networks is still a new area in network mod- 
eling, including communication networks, and has only 
been addressed recently ^ . 

Table n lists the numbers of nodes (N) and edges (E) 
of three out of six P2P social networks studied (marked 
as SNl original SN6 original) collected from Gnutella, 



both at a magnitude of 10^ ~ 10^. The other three are 
not shown for space reason, but exhibit the same statis- 
tics as of those discussed henceforth. Among tens or hun- 
dreds of thousands of peers, only a few of them acted as 
both requesters and providers. These peers play a major 
role in P2P social networks as they contribute essential 
links to the networks. These peers are hence called major 
peers. Table |2 also shows the information of the social 
networks of major peers (marked as SNl major SN6 
major), refined from the above original social networks, 
respectively. The number of major nodes and their edges 
is only of 10^ ~ 10"^. For instance, the number of nodes in 
the major network obtained from SNl drops from 42,186 
to only 221. In the remaining of this paper, both origi- 
nal P2P social networks and major peers' social networks 
will be investigated. 



III. STATISTICAL ANALYSIS 
A. Connectivity properties 

Table ^ gives a summarization of the average degree 
(fc), range of out degrees kout and in degrees kin for 
the unweighted representations of P2P networks ana- 
lyzed. In the case of weighted representations, the table 
shows the average weighted degree or strength (k^) — 
J2j "^ij + J2j "^31 s-iid range of weighted out k^^out (the 
first term in the sum) and in k^-in degrees (the second 
term in the sum) of the original and major P2P social 
networks studied. Here, Wij is the weight of the ij link 
and means that j answered Wij queries from i. The av- 
erage connection weight (w), the weight range w and the 
number of symmetric links are also listed in this table. 

Each peer in the original peer social networks has an 
average of 4.3±0.22 neighbors. This also means that on 
average a peer has 2.15 out degrees and in degrees. This 
number slightly increases with the number of peers, but 
is very small compared with a fully connected network of 
the same size (fc) = A^ — 1 ~ lO^'^^. Some peers, however, 
have up to nearly three thousands to tens of thousands 
out connections (i.e., resource providers), while the max- 
imum connected resource requesters (i.e., in degree) of a 
peer is only hundreds up to one thousand. This suggests 
that there are generally more available providers, though 
a provider only serves a small fraction of peers in the 
network. The average weighted degree is around 9^12 
per node and the average connection weight is around 
2.3. That is, the frequency of a peer to contact another 
is about 2.3 times, though in reality a peer can answer 
another peer's requests as many as ten thousands times. 

Similar results have been shown in major peers' social 
networks. The social networks of major peers are denser 
than the original ones, as the average connectivity is al- 
most doubled among major peers. The average connec- 
tion strength of major social networks is nearly the same 
as that of the original social networks, suggesting that the 
average level of peer acquaintance is independent from 
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FIG. 1; Cumulative undirected, out and in degree distribu- 
tions for three P2P networks and their weighted representa- 
tions. Values of the exponents characterizing the (power-law) 
distributions are reported in Table lllll Note that although 
SNl, SN5 and SN6 are different networks, they all fall in 
what seems to be a universal curve. 



network sizes. While there are hundreds of connections 
present in the network, only few of them have symmet- 
ric links, less than 0.03% of the whole connections and 
all the symmetric connections are between major peers. 
This proves that real peer social networks are extremely 
asymmetric: while one peer presents a useful social con- 
tact to another, it is seldom the case in which the other 
deems that one as its useful supplier. 

Tablelnllists the percentage of peers that have no or 1, 
2 and more out and in connections in both original and 
major social networks. Significantly, 98.5% of peers have 
no out neighbors at all. These peers are pure providers 
that never requested anything. Accordingly there are 



TABLE II: Percentage of peers with null, 1, 2 and more out and in degrees. Note that there are much more resource providers 
than requesters. 



fc= 1 2 >2 



J- / ■ ■ 1 \ 

Out (ongmal) 


98.5±0.02% 


0.1d±0.047o 
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<1.27% 


In (original) 


0.86±0.03% 


68.5±4.3% 


14.6±1.7% 


<16.1% 


Out (major) 


42±2.6% 


17.7±1.6% 


8.6±1.1% 


<31.7% 


In (major) 


15±2.5% 


33±2.4% 


15.2±1.2% 


<36.8% 



only 0.86% peers that did not answer any request of oth- 
ers. 68.5% of the peers answered one query and more 
than 30% peers answered more. A similar phenomenon 
has also been found in major peers networks. The above 
result, namely, the fact that there are much more re- 
source providers than requesters, points to an important 
structural property that may be at the root of the high 
reliability of Gnutella despite the system's extreme dy- 
namics and uncertainty. 

The degree distributions of undirected, out and in con- 
nections have also been investigated. Fig. illustrates 
unweighted and weighted degree distributions of the orig- 
inal social networks SNl, SN5 and SN6 respectively. (So- 
cial networks of major peers present very similar degree 
distributions so they are not shown here due to the lack of 
space.) The results confirm that peer social networks fol- 
low power-law distributions and the exponents are sum- 
marized in Table ITTll 

It is worth noting that a universal exponent has been 
obtained for each group of networks (see Fig. ^ , namely 
P2P social networks show the same exponent of the de- 
gree distribution for undirected connections no matter of 
their specific characteristics (e.g., size, number of edges, 
etc) and the same holds for directed and weighted distri- 
butions. Moreover, weighted networks exhibit similar de- 
gree distributions, though statistically different as far as 
the exponent of the power law distribution is concerned, 
to those of unweighted networks. For six peer social net- 
works and corresponding major networks, their out de- 
gree distributions have an average exponent of 7 w 1 < 2, 
and both in and undirected degree distributions have an 
exponent 7 > 2. This is an interesting feature as 7 = 2 
forms a dividing line between networks with two different 
dominating behaviors. Hence the different power-law dis- 
tributions obtained here suggest that the average proper- 
ties of peer social networks are dominated by (requesting) 
individuals that have a large number of providers, while 
providing peers with fewer connected requesters domi- 
nate the provision flow of resources. 

Recent studies reported that the underlying peer-to- 
peer Gnutella network has degree exponent less than 2 
^301 contrary to the undirected degree exponent of 
P2P social networks found in our work. While global 
information exchange mechanisms are closely related to 
networks with exponent 7 < 2 P2P social net- 

works may involve more local interactions between as- 
sociated peers. However, peer social networks won't pre- 



vent global interaction and information diffusion (e.g., 
web caches) if required. It would be interesting to see 
the performance and topological changes when P2P so- 
cial networks are incorporated with those global mecha- 
nisms. 



B. Average shortest path lengths and betweenness 

The shortest distances between all pair of peers that 
have (directed) paths from one to another are calculated. 
The average distances of the shortest paths in the orig- 
inal and major social networks are around 6.6 and 4.6 
respectively, as shown in Table HI Here the law of six 
degrees of separation still come into existence in spite of 
the huge sizes and sparseness of the peer social networks. 
The social networks of major peers are obviously better 
connected. In general, a major peer can reach another 
randomly chosen major peer in around 4.6 steps. The 
smaller average shortest path length of major peers is of 
the order one may expect from the logarithmic depen- 
dency of (/) with TV in small-world networks. Another 
possible explanation is that major peers show disassorta- 
tive correlations. This kind of correlations happens when 
nodes of different degrees are likely connected. That is, 
there is no core that concentrates all major peers. Other- 
wise, one would expect a greater decrease in the average 
shortest path lengths than that observed. This hypothe- 
sis will be confirmed in the following analysis on degree- 
degree correlations, which shows that, within statistical 
fluctuations, peer social networks are mainly disassorta- 
tive. 

The average path lengths of both original peer so- 
cial networks and major peer social networks are much 
smaller than those for a regular two-dimensional lattice 
of the same size, which range from tens to hundreds. It 
has been found that the average distances vary logarith- 
mically with the number of individuals in some kinds 
of social networks including scientific collaboration net- 
works Ullll. Unfortunately, our data are too sparse to 
confirm or reject this. (However, as shown in the tables, 
(l) is certainly small in all cases.) Analysis of more peer 
social networks may be helpful. 

The maximum distance Imax between connected peers, 
or diameter of the network, is on average 14.5 for original 
social networks and 12.5 for major peer networks. This 
suggests that connected peers in these networks can be 



TABLE III: Exponents 7 for undirected, directed and weighted representations of P2P social networks. 



7 Undirected Out In 

Original unweighted 2.1±0.07 0.95±0.12 2.6±0.13 

Major unweighted 2.53±0.096 1.14±0.18 2.65±0.062 

Original weighted 2.98±0.026 0.92±0.09 2.2±0.11 

Major weighted 2.13±0.1 1.03±0.14 2.2±0.14 
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FIG. 2; Frequency of average shortest path lengths in major 
peer social networks. 



FIG. 3: Cumulative betweenness distribution of the undi- 
rected representation of three major P2P networks. 



reached by a chain of at most 15 or 13 acquaintances. Fig. 
12 illustrates the frequency of the shortest paths in social 
networks SNl, SN5 and SN6 respectively. These shortest 
paths have a long tail, which distinguishes peer social 
networks from random networks with the same number 
of nodes and edges. The long tail of the shortest path 
has been reported as a property of small world networks 

El. 

A property closely related to the distribution of aver- 
age shortest path lengths is the betweenness. The be- 
tweenness measures the centrality of a node in a network 
and allows exploration of the influence a node has over 
the spread of information through the network. It is nor- 
mally calculated as the fraction of shortest paths between 
node pairs that pass through the node of interest. Be- 
tweenness is commonly applied in social network anal- 
ysis, and has been recently introduced for load analysis 
in scale-free networks 18]. A direct calculation of peer 
betweenness in the original peer networks is rather labori- 
ous due to the enormous number of peers involved. Here 
only the average betweenness (b)/N of the major peers 
social networks is presented in this section, as listed in 
Table HI The average betweenness over major peers is 
between 0.37V ~ N, indicating that the social networks 
are not dominated by a few highly connected peers. 

We further investigated betweenness distribution p{b), 
the probability that any given peer is passed over by b 
shortest paths (see Fig. O and the relationship between 
the average betweenness of a peer and its connectivity 
k (see Fig. 0)). Again, no clear power-law decay for the 
former or a linear increase for the latter has been found, 



as previously reported for other networks |l7lllS| . In our 
case, the fact that bk does not scale with k, and hence, 
the lack of any correlations important for information 
traffic and delivery, is another indication of the unique 
topological properties of these networks, making their 
functioning very reliable and robust. It is worth not- 
ing at this point that an interesting and relevant issue to 
be explored more carefully in future works is whether or 
not self-averaging verifies in these systems. While Figs. 
|51and|31may suggest the lack of self-averaging, they cor- 
respond to major networks, which are still too small to 
draw definitive conclusions. Moreover, the intrinsic dy- 
namic nature of these networks may perfectly reconcile 
networks properties that are not sample-dependent (e.g 
global properties such as degree distributions) with other 
local metrics that depend on the sampling (as those de- 
picted in Figs. 121 and 13 . 



C. Clustering coefficient 

The clustering coefficient is an important local net- 
work property that measures how well the neighbors of 
any node in a network are locally connected. Table ^ 
gives the values of clustering coefficients of the networks 
studied here. Original peer social networks possess a sim- 
ilar clustering coefhcicnt (c) « 0.02. This small number 
suggests that peer neighbors are not closely connected, 
i.e., only a few neighbors deem others as their acquain- 
tances. However, the closeness of peer social networks 
is better than ER random graphs with the same size 



1 




10 

k 



FIG. 4: Betweenness bk as a function of the peer's connectiv- 
ity k. Note tiie lack of any scaling of bk with k. See the text 
for further details. 



and average connectivity, whose clustering coefficients 
are {c)rand = {k)/N w 10~^, three orders of magnitude 
less than those of the peer social networks. At the same 
time, the estimate for the clustering coefficient might 
be consistent with that of random graphs with scale- 
free degree distribution. Compared with the original so- 
cial networks, major peers show closer relationships with 
each other. The clustering coefficients of major peers 
are nearly 0.1, one to two magnitudes larger than their 
corresponding random graphs. Thus the active players 
in peer social networks, which both provide and request 
resources, are themselves relatively well connected. 

The clustering coefficients are kept constant for peer 
social networks or major peers social networks with dif- 
ferent sizes, suggesting there may be a unique value to 
them, a property that has been observed in other sys- 
tems as well PI 3- Moreover, the highly clustered prop- 
erty and short paths between distributed peers (as intro- 
duced in Section III.B) conffim that peer social networks 
are small worlds, as other natural or artificial networks, 
such as ecosystems, human societies and the Internet 

Studies on scientific collaboration networks and Inter- 
net topologies reported a power-law relationship between 
the average clustering coefficient Ck over nodes of degree 
k, that is, Cfc ~ fc-'^ [Illi^. Fig. plots Ck of some 
original peer social networks in relation to peers' undi- 
rected, out and in degrees. A clear power-law form is 
difficult to claim in our data. Nevertheless, the non-flat 
clustering coefficient distributions shown in the figures 
suggest that the dependency of C on fc is nontrivial, and 
thus points to some degree of hierarchy in the networks. 
Further study of social networks' hierarchy will clarify 
this point and will be undertaken in future work. 



D. Degree-degree correlations 
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FIG. 5: Cumulative clustering coefficient Ck as a function of 
undirected, out and in degrees k. 



other nodes with many connections and vice versa. Tech- 
nological and biological networks are in general disassor- 
tative, and social networks are often assortatively mixed, 
as suggested by the study on scientific collaboration so- 
cial networks Contrasting to this however, Internet 
dating communities, a kind of social network embedded 
in a technological one, displayed a significant disassorta- 
tive mixing 19]. This seems to be our case as well. 

Table IIVI lists the correlation coefficients of all types 
of degree-degree correlations for both original peer social 
networks and networks of major peers. Correlations are 
measured by calculating the Pearson's correlation coeffi- 
cient r for the degrees at either side of an edge: 



Networks with assortative mixing are those in which 
nodes with many connections tend to be connected to 



{kout)W{kL)-{hny 



(1) 



Similar to Internet dating communities, peer social 
networks present dissortative mixing when directions are 
not considered in peer connections. Positive mixing is 
shown for rinout and r.inin in most social networks, sug- 
gesting that active requesters (with a high fcout) tend to 
associate active providers (with a high kin), and even 
active providers tend to associate with each other. Be- 
tween major peers that both provide and request re- 
sources, active requesters also have a preference towards 
each other. It is not surprising that Toutm is always neg- 
ative in both original and major peer networks, which 
means that providers with many requesters are actually 
less often associated with frequent requesters. The gener- 
ally dissortative mixing property of peer social networks 
suggests that peer networks in general are vulnerable to 
targeted attacks on highest degree peers but a few attacks 
on some providers may not destroy the network connec- 
tivity due to the existence of other providers in the core 
group. 

IV. CONCLUSIONS AND FUTURE WORK 

This paper presents the first study on social associa- 
tions of distributed peers in Peer-to-Peer networks. Sev- 
eral peer social networks have been constructed from the 
real user data collected from the Gnutella system. Basic 
properties of the social networks, including degree dis- 
tributions, local topological quantities and degree-degree 
correlations have been particularly studied in this pa- 
per. The results have proved that peer social networks 
are small world networks, as peers are clustered and the 
path length between them is small. Moreover, most of 
the peers (nearly 98.5%) are pure resource providers, con- 
tributing to the high resource reliability and availability 
of P2P networks in resource sharing. Comparatively, free 
riding peers that do not contribute any resources are only 
a small fraction (less than 1%) of the whole network. For 
peers that have more than one connection, their undi- 
rected, directed (including out and in) and weighted de- 



gree distributions follow a clear power-law distribution. 
The exponents are greater than 2 for undirected and in 
degrees and nearly 1 for out degrees. Investigations on 
betweenness and correlations suggest that dynamics of 
peer social networks are not dominated by a few highly 
connected peers. In fact, the peer degrees are generally 
disassortative mixing, except some Vinin and rinout, sug- 
gesting that active providers are connected between each 
other and by active requesters. 

The collected social networks studied in this paper are 
only some small snapshots of the large-scale and contin- 
uously changing P2P networks. However, the kind of 
study performed here allows us to touch upon the real 
network topologies that are difhcult to obtain with exist- 
ing network models. The analysis results will give use- 
ful hints for the future design of effective P2P systems, 
by considering their acyclic topologies and small world 
architecture. In the future, the joint relation of the so- 
cial network topology and the topology of the underlying 
peer-to-peer network (e.g., Gnutella) will be studied to 
examine their commonness and discrepancy. On top of 
the kind of network found in the study, simulations of 
processes can be enabled to investigate spreading pro- 
cesses 13, ^2^ , modeling of traffic flow ^21^ and optimiza- 
tion of network resources ^2^] . Based on the current study 
on peer betweenness and degree correlations, we will fur- 
ther investigate network hierarchy, peer work load and 
dynamic properties of P2P social networks. 
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